Selection of Relevant Features in Machine Learning

نویسنده

  • Pat Langley
چکیده

In this paper, we review the problem of selecting relevant features for use in machine learning. We describe this problem in terms of heuristic search through a space of feature sets, and we identify four dimensions along which approaches to the problem can vary. We consider recent work on feature selection in terms of this framework, then close with some challenges for future work in the area. 1. The Problem of Irrelevant Features The selection of relevant features, and the elimination of irrelevant ones, is a central problem in machine learning. Before an induction algorithm can move beyond the training data to make predictions about novel test cases, it must decide which attributes to use in these predictions and which to ignore. Intuitively, one would like the learner to use only those attributes that are ’relevant’ to the target concept. There have been a few attempts to define ’relevance’ in the context of machine learning, as John, Kohavi, and Pfleger (1994) have noted in their review of this topic. Because we will review a variety of approaches, we do not take a position on this issue here. We will focus instead on the task of selecting relevant features (however defined) for use in learning and prediction. Many induction methods attempt to deal directly with the problem of attribute selection, especially ones that operate on logical representations. For instance, techniques for inducing logical conjunctions do little more than add or remove features from the concept description. Addition and deletion of single attributes also constitute the basic operations of more sophisticated methods for inducing decision lists and decision trees. Some nonlogical induction methods, like those for neural networks and Bayesian classifiers, instead use weights to assign degrees of relevance to attributes. And some learning schemes, such as the simple nearest neighbor method, ignore the issue of relevance entirely. We would like induction algorithms that scale well to domains with many irrelevant features. More specifically, we would like the sample complexity (the number of training cases needed to reach a given level of accuracy) to grow slowly with the number of irrelevant attributes. Theoretical results for algorithms that search restricted hypothesis spaces are encouraging. For instance, the worst-case number of errors made by Littlestone’s (1987) WINNOW method grows only logarithmically with the number of irrelevant features. Pazzani and Sarrett’s (1992) average-case analysis for WHOLIST, a simple conjunctive algorithm, and Langley and Iba’s (1993) treatment of the naive Bayesian classifier, suggest that their sample complexities grow at most linearly with the number of irrelevant features. However, the theoretical results are less optimistic for induction methods that search a larger space of concept descriptions. For example, Langley and Iba’s (1993) average-case analysis of simple nearest neighbor indicates that its sample complexity grows exponentially with the number of irrelevant attributes, even for conjunctive target concepts. Experimental studies of nearest neighbor are consistent with this conclusion, and other experiments suggest that similar results hold even for induction algorithms that explicitly select features. For example, the sample complexity for decision-tree methods appears to grow linearly with the number of irrelevants for conjunctive concepts, but exponentially for parity concepts, since the evaluation metric cannot distinguish relevant from irrelevant features in the latter situation (Langley & Sage, in press). Results of this sort have encouraged machine learning researchers to explore more sophisticated methods for selecting relevant features. In the sections that follow, we present a general framework for this task, and then consider some recent examples of work on this important problem. 2. Feature Selection as Heuristic Search One can view the task of feature selection as a search problem, with each state in the search space specifying a subset of the possible features. As Figure 1 depicts, one can impose a partial ordering on this space, with each child having exactly one more feature than its parents. The structure of this space suggests that any feature selection method must take a stance on four basic issues that determine the nature of the heuristic search process. 127 From: AAAI Technical Report FS-94-02. Compilation copyright © 1994, AAAI (www.aaai.org). All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Architecture for Detecting Phishing Webpages using Cost-based Feature Selection

Phishing is one of the luring techniques used to exploit personal information. A phishing webpage detection system (PWDS) extracts features to determine whether it is a phishing webpage or not. Selecting appropriate features improves the performance of PWDS. Performance criteria are detection accuracy and system response time. The major time consumed by PWDS arises from feature extraction that ...

متن کامل

Fast SFFS-Based Algorithm for Feature Selection in Biomedical Datasets

Biomedical datasets usually include a large number of features relative to the number of samples. However, some data dimensions may be less relevant or even irrelevant to the output class. Selection of an optimal subset of features is critical, not only to reduce the processing cost but also to improve the classification results. To this end, this paper presents a hybrid method of filter and wr...

متن کامل

Bridging the semantic gap for software effort estimation by hierarchical feature selection techniques

Software project management is one of the significant activates in the software development process. Software Development Effort Estimation (SDEE) is a challenging task in the software project management. SDEE is an old activity in computer industry from 1940s and has been reviewed several times. A SDEE model is appropriate if it provides the accuracy and confidence simultaneously before softwa...

متن کامل

Modeling and design of a diagnostic and screening algorithm based on hybrid feature selection-enabled linear support vector machine classification

Background: In the current study, a hybrid feature selection approach involving filter and wrapper methods is applied to some bioscience databases with various records, attributes and classes; hence, this strategy enjoys the advantages of both methods such as fast execution, generality, and accuracy. The purpose is diagnosing of the disease status and estimating of the patient survival. Method...

متن کامل

Development of an Ensemble Multi-stage Machine for Prediction of Breast Cancer Survivability

Prediction of cancer survivability using machine learning techniques has become a popular approach in recent years. ‎In this regard, an important issue is that preparation of some features may need conducting difficult and costly experiments while these features have less significant impacts on the final decision and can be ignored from the feature set‎. ‎Therefore‎, ‎developing a machine for p...

متن کامل

Prostate cancer radiomics: A study on IMRT response prediction based on MR image features and machine learning approaches

Introduction: To develop different radiomic models based on radiomic features and machine learning methods to predict early intensity modulated radiation therapy (IMRT) response.   Materials and Methods: Thirty prostate patients were included. All patients underwent pre ad post-IMRT T2 weighted and apparent diffusing coefficient (ADC) magnetic resonance imagi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001